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Figure 1: (a) Blurry frame of the bicycle sequence, (b) Deblurring result of Cho et aL[7]. (c) Our result. 


Abstract 

Several state-of-the-art video deblurring methods are 
based on a strong assumption that the captured scenes are 
static. These methods fail to deblur blurry videos in dy¬ 
namic scenes. We propose a video deblurring method to 
deal with general blurs inherent in dynamic scenes, con¬ 
trary to other methods. To handle locally varying and gen¬ 
eral blurs caused by various sources, such as camera shake, 
moving objects, and depth variation in a scene, we ap¬ 
proximate pixel-wise kernel with bidirectional optical flows. 
Therefore, we propose a single energy model that simulta¬ 
neously estimates optical flows and latent frames to solve 
our deblurring problem. We also provide a framework and 
efficient solvers to optimize the energy model. By minimiz¬ 
ing the proposed energy function, we achieve significant im¬ 
provements in removing blurs and estimating accurate op¬ 
tical flows in blurry frames. Extensive experimental results 
demonstrate the superiority of the proposed method in real 
and challenging videos that state-of-the-art methods fail in 
either deblurring or optical flow estimation. 

1. Introduction 

Motion blurs are the most common artifacts in videos 
recorded using hand-held cameras. For decades, several 
researchers have studied deblurring algorithms to remove 
motion blurs. Their methodologies depend on whether the 


captured scenes are static or non-static. Early works on 
single image deblurring usually assumed that the scene is 
static with constant depth [5, 9, 10, 11, 25, 27]. The suc¬ 
cessful approaches were naturally extended to video deblur¬ 
ring. In the work of Cai et al. [2], a multi-image deconvo¬ 
lution method was proposed using sparsity of blur kernels 
and clear image to handle registration errors. However, this 
method only enables two-dimensional translational cam¬ 
era motion, which generates uniform blur. Therefore, the 
proposed approach cannot handle rotational camera shake, 
which is the main cause of large motion blurs [27]. To over¬ 
come this limitation, Li et al. [21] used a method parame¬ 
terizing spatially varying motions with 3x3 homographies, 
and could handle spatially varying blurs by camera rota¬ 
tion. In the work of Cho et al. [4], camera motion in three- 
dimensional space was estimated without the assistance of 
specialized hardware. In addition, non-uniform blurs by 
projective camera motion could be removed. Spatially vary¬ 
ing blurs by depth variation in a static scene was handled 
recently in the works of Lee and Lee [19] and Paramanand 
etal. [23]. 

However, previous approaches, which assume that the 
scene is static, suffer from general blurs not only from cam¬ 
era shake but also from moving objects and depth varia¬ 
tions in a dynamic scene. As parameterizing a spatially 
varying blur kernel in the dynamic scene is difficult with 
simple homography, kernel estimation to handle dynamic 
scene becomes more challenging. Therefore, several re- 






Figure 2: (a) Blurry frame of video containing moving car. (b) Our deblurring result, (c) Our color coded optical flow. 


searchers have focused on restoring dynamic scenes, which 
is mainly grouped into two approaches: segmentation-based 
approach, and exemplar-based approach. 

Segmentation-based deblurring approaches simultane¬ 
ously estimate multiple motions, multiple kernels, and as¬ 
sociated image segments. Cho et al. [6] proposed a method 
that segments images into multiple regions of homogeneous 
motions and estimates the corresponding blur kernel as a 
one-dimensional Gaussian kernel. Therefore, this method 
cannot handle complex motions of objects and rotational 
motions of cameras that generate locally varying blurs. Bar 
et al. [1] proposed a layered model and segmented images 
into two layers (foreground and background). In addition, 
they estimated a linear blur kernel corresponding to a fore¬ 
ground layer. Although this method can explicitly handle 
occluded regions using a layered model, the kernel is lim¬ 
ited to a one-dimensional box filter only, and only a static 
camera is allowed. Wulff and Black [28] extended the pre¬ 
vious work of Bar et al. They focused on estimating the 
parameters for both foreground and background motions. 
However, the motions within each segment are only pa¬ 
rameterized using the affine model, and extending to multi¬ 
layered scenes is difficult because such task requires joint 
estimation of depth ordering of the layers. In summary, 
segmentation-based approaches have the advantage of han¬ 
dling blurs by moving objects in dynamic scenes. How¬ 
ever, parameterizing the motions in each segment remains 
an issue [16]. That is, it fails to segment non-parametrically 
varying complex motions such as motions of people, be¬ 
cause doing so with the simple models used in [1, 28] is 
difficult. 

The works of Matsushita et al. [22] and Cho et al. [7] 
are typical exemplar-based approaches. These works esti¬ 
mate latent frames by interpolating sharp patches, that com¬ 
monly exist in a long image sequence. Therefore, these 
methods disregard accurate segmentation and deconvolu¬ 
tion, enabling the emergence of ringing artifacts. However, 


the former work cannot handle blurs by moving objects. 
Moreover, the latter one can only treat blurs by slightly 
moving objects in dynamic scenes because it searches sharp 
patches of a blurry patch using globally parameterized ker¬ 
nel with homography. Therefore, handling fast-moving ob¬ 
jects, which have distinct motions from backgrounds, is dif¬ 
ficult. Moreover, it degrades mid-frequency textures, such 
as grasses and trees, because this method does not use de- 
convolution with spatial priors but use interpolation to re¬ 
store latent frames, which renders smooth results. 

To alleviate the problems in previous works, we propose 
a new generalized video deblurring method that estimates 
latent frames without using global motion parametrization 
and segmentation. We estimate bidirectional optical fiows 
and use them to estimate pixel-wise varying kernels. There¬ 
fore, we can naturally handle coexisting blurs by camera 
shake, moving objects with complex motions, and depth 
variations. However, sharp frames are required to obtain 
accurate optical fiows because estimating fiow fields is dif¬ 
ficult between blurry images. In addition, accurate optical 
fiows are necessary to restore sharp frames. This case is 
a typical chicken-and-egg problem, and thus we simultane¬ 
ously estimate both variables. Therefore, we propose a new 
single energy model to solve our joint problem. We also 
provide a framework and efficient techniques to optimize 
the model. The result of our system is shown in Fig. 2, in 
which the moving car is successfully restored because ac¬ 
curate optical fiows are jointly estimated. 

By minimizing the proposed energy function, we achieve 
significant improvements in numerous real challenging 
videos that other methods fail to do, as shown in Fig.l. 
Furthermore, we estimate more accurate optical fiows com¬ 
pared with the state-of-the-art fiow estimation method, that 
handles blurry images. The performances are demonstrated 
in our extensive experiments. 












Figure 3: (a) Bidirectional optical flows, (b) Piece-wise lin¬ 
ear blur kernel at pixel location x. 


2. Generalized Video Deblurring 

Most conventional video deblurring methods suffer from 
the coexistence of various motion blurs from dynamic 
scenes because the motions cannot be parameterized using 
global or segment-wise parameterization. To handle general 
blurs, we propose a new energy model using pixel-wise ker¬ 
nel estimation rather than global or segment-wise parame¬ 
terization. As blind deblurring is a well-known ill-posed 
problem, our energy model not only consists of data and 
spatial regularization terms but also a temporal term. The 
model is expressed as follows: 


E — ^data ^temporal ^spatial 5 


( 1 ) 


and the details of each term in (1) are given in the following 
sections. 


2.1. Data Model based on Approximated Blur 

In conventional works, the motion blurs of each frame 
are approximated using parametric models such as homo- 
graphies and affine models [1, 7, 21, 28]. However, these 
kernel approximations are valid when motion blurs are pa- 
rameterizable within an entire frame or segment. There¬ 
fore, pixel-wise motion and kernel estimation are required 
to cope with general blurs. We approximate the pixel-wise 
blur kernel using bidirectional optical flows, in accordance 
with previous works [8, 16, 24]. 

Specifically, under an assumption that the velocity of the 
motion is constant between adjacent frames, our blur model 
is expressed as follows: 




1 


Jo 





Figure 4: (a) Blurry frame of a video in dynamic scene, (b) 
Locally varying kernel using homography. (c) Our pixel- 
wise varying kernel using bidirectional optical flows. 


where and = 

denote bidirectional optical flows at 
frame i. Blurry frame and latent frame are B^ and L^, re¬ 
spectively. Camera duty cycle of the frame is and denotes 

relative exposure time [21]. We define the image warping, 
H(Li,t • which transforms the frame to 

when 0 < t < 1, and H (L^, transforms the frame 

to L^_t. Our bi-directional optical flows, duty cycle, and 
the corresponding piece-wise linear kernel used in our blur 
model are illustrated in Fig. 3. 

Although our blur kernel model is simple, our model can 
be justified because we treat video that has short exposure 
time to some extent. Therefore, we approximate the kernel 
as piece-wise linear using bidirectional optical flows: 

^i,x (^5 

' ’ if ^ ^ [O’ TiUi^i+i ], f e [0, nvi^i+i] 

^ ’ if ^ ^ (0,e (0,’ 

0, otherwise. 

(3) 

where ki^x{u, v) is the blur kernel using bidirectional opti¬ 
cal flows at pixel location x, and 6 denotes Kronecker delta. 

Using this pixel-wise kernel approximation, we can eas¬ 
ily manage multiple different blurs in a frame, unlike con¬ 
ventional methods. The superiority of our kernel model is 
shown in Fig. 4. Our kernel model fits blurs from differ¬ 
ently moving objects and camera shake much better than 
the conventional homography-based model. 

Therefore, we cast pixel-wise kernel estimation problem 
as an optical flows estimation problem. Discretizing the 


























constraint (2) gives the following data term: 

E(iata (L, U, B) = 

aEE (t^, i)L2 ^ ^ 

i a, 

where the row vector of blur kernel matrix K^, correspond¬ 
ing to the blur kernel at pixel x, is the vector form of ki^x{.), 
and its elements are non-negative and their sum is equal 
to one. Linear operator 9* denotes the Toeplitz matrices 
corresponding to the partial (e.g., horizontal and vertical) 
derivative filters. Parameter A controls the weight of the 
data term, and L, u, and B denote the set of latent frames, 
optical fiows, and blurry frames, respectively. 

2.2. Temporal Coherence with Optical Flow Con¬ 
straint 

Here, we determine that optical fiows are required to esti¬ 
mate the pixel-wise blur kernel. However, the proposed data 
term does not have conventional optical flow constraints 
such as brightness constancy or gradient constancy in (4). 

In general, such constraints do not hold between two blurry 
frames. Thus, Portz et al. [24] proposed a method to apply 
flow constraints between blurry images. Based on the com¬ 
mutative law of shift invariance of kernels [13], the authors 
of [24] convolved the approximated blur of each observed 
image to the other image and assumed constant brightness 
between them at matched points. However, the commutativ¬ 
ity property does not hold in theory when the kernel is not 
translation invariant. Therefore, this approach only works 
when the motion is smooth enough. 

To address this problem, we propose a new model that 
finds correspondences between two latent sharp images to 
enable abrupt changes in motions and the corresponding 
kernels. In using this model, we need not restrict our blur 
kernels to be shift invariant. Our model is based on the con¬ 
ventional optical flow constraint between latent images, that 
is, brightness constancy. The formulation is expressed as 
follows: 

N 

Ftempora/(L, u) = ^ ^ ^ ^ (x + 1 5 

i n=—N 

(5) 

where n denotes the index of neighboring frames at i. Con¬ 
stant parameter iin controls the weight of each term in the 
summation. We apply the robust norm to offer robust¬ 
ness against outliers and occlusions. 

Notably, a major difference between the proposed model 
and the conventional optical flow estimation methods is that 
our problem is a joint problem. That is, the brightness of 
latent frames and optical fiows need to be simultaneously 
estimated. Therefore, our model simultaneously enforces 
the temporal coherence of latent frames and estimates the 
correspondences. 


2.3. Spatial Coherence 

To alleviate the difficulties of highly ill-posed deblurring 
and optical fiow estimation problems, several researchers 
have emphasized the importance of spatial regularization. 
Therefore, we also enforce spatial coherence to penalize 
spatial fluctuations while allowing discontinuities in both 
latent frames and fiow fields. We assume that spatial priors 
for latent frames and optical fiows are independent. They 
are expressed as follows: 

N 

FgpatiaZ (L, u) — ^|VLi|+ ft(x)|Vu 

I • 

i n= — N 

( 6 ) 

The first term in (6) denotes the spatial regularization term 
for the latent frames. Although more sparse norms (e.g., 
p = 0.8) fit the gradient statistics of natural sharp images 
better [17, 18, 20], we use conventional total variation (TV) 
based regularization [12, 14, 16], as TV is computationally 
less expensive. The second term denotes the spatial smooth¬ 
ness term for optical fiows. We adopt edge-map coupled 
TV-based regularization [15] to preserve discontinuities in 
the fiow fields at edges. Similar to [16], the edge-map is 
expressed as follows: 

gi{x) = z/exp(-(^7Ll)2^^ (7) 

where u controls the scale of the edge-map, parameter aj 
controls the weight, and is an initial latent image in the 
iterative optimization framework. 

3. Optimization Framework 

In the previous sections, we described the ^data, 
^temporal, and Egpatiai terms. When camera duty cycle r* 
is known, our final objective function becomes as follows: 

min - a*Bi||^+ 

L,u ^^' 

i 

N 

E E ’ |Fi(x) L2-|-77 ,(x -|- I-|- 

i n=-N 

N 

5]|VL,|+ ^ 5i(x)|Vu 

i^i+n I • 

i n= — N 

( 8 ) 

Unlike the work of Cho et al. [ ], which sequentially per¬ 
forms multi-phase approaches, our model obtains a solution 
by minimizing a single objective function. However, be¬ 
cause of its non-convexity, our model is required to adopt 
practical optimization methods to obtain approximated so¬ 
lution. Therefore, we divide the original problem into two 



sub-problems and use conventional iterative and alternating 
optimization techniques [5, 28] to minimize the non-convex 
objective function. In the following sections, we introduce 
efficient solvers and describe how to estimate unknowns L 
and u, with one of them being fixed. 

3.1. Sharp Video Restoration 

While the optical flows u are fixed, corresponding blur 
kernels are also fixed, and our objective function in ( 8 ) be¬ 
comes convex with respect to L, and is expressed as fol¬ 
lows: 

min A 

i 

N 

E E ' |L 2 (x) 

i n= — N 

(9) 


To obtain L, we adopt the conventional convex optimization 
method in [3], and derive the primal-dual update scheme as 
follows: 


m+l _ 
- 


sr+yyi^ALf 

max(l, abs(s^+?7i,AL^)) 




= 

^2 77, 


^i+n 


max(l, Sibs(q^^-\-r]LfJ^n^i,n 


^i+n 


j^m +1 _ argmin 


(L, - (Lr - e^(A + Y.:=-n A^nPj.n'^ qr+^))) 

2eL 

( 10 ) 

where m > 0 indicates the iteration number, and, and 
^ denote the dual variables. Parameters r]L and cl de¬ 
note the update steps. A linear operator A calculates the 
spatial difference between neighboring pixels, and another 
operator calculates the temporal differences between 
hi (x) and Li+n (x + • To update the primal variable 

and obtain in ( 10 ), we apply the conjugate gradient 

method to optimize the quadratic function. 
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T_m+ 1\\\2 


3.2. Optical Flows Estimation 


While the latent frames L are fixed, temporal coherence 
term ^temporal becomes convex but the data term Edata re¬ 
mains non-convex. Therefore, we define a non-convex fi¬ 
delity function p{.) as follows: 

/9(x,u) = ||a*Ki(ui_>i+i,Ui_>i_i)Li - a*Bi||2+ 

* d, 

N 

E E Mn ' 1^2(x) Ui^i+n)|’ 

2 n=-N 

( 11 ) 



Figure 5: Temporally consistent optical flows over three 
frames. 

To find the optimized values of optical fiows u, we first con- 
vexify the non-convex function p(.) by applying the first- 
order Taylor expansion. Similar to [16], we linearize the 
function near an initial Uq in the iterative process as follows: 

p{x, u) « p{x, Uo) + V/>(x, uo)^(u - Uo). (12) 

Therefore, our approximated convex function for optical 
fiows estimation is expressed as follows: 


N 

rninp(x,Uo) + V/ 9 (x,uo)^(u — Uq) + ^ ^ ^i(x)|Vu i^i+n 

^ i n=-N 

(13) 


Next, we apply the convex optimization technique in [3] to 
the approximated convex function (13), and the primal-dual 
update process is expressed as follows: 


= _ 

max(l, abs(p; 

„m+l 


+^w(GiA)u" 




(u; 




,+r,„(G,A)u^,+„)) 
e„,(L.<A)-p” 


,(G,A)^p-+i) 


V2,nP(^5 Uq) 5 

(14) 


where p^ ^ denotes the dual variable of on the vec¬ 

tor space and the diagonal matrix is the weighting matrix 
denoted as = di 2 ig{gi{x)). Parameters pu and Cu denote 
the update steps and V 2 ,r 2 p(x, uq) means |uo. 


4. Implementation Details 

To handle large blurs and guide fast convergence, we 
implement our algorithm on the traditional coarse-to-fine 
framework with empirically determined parameters. We use 
A = 250 for our most experiments, and other parameters are 
determined as /i^ = A, z/ = 0.08A, cr/ = ^, and N = 2. 
In the coarse-to-fine framework, we build image pyramid 
with 17 levels for a high-definition( 1280x720) video, the 
scale factor is 0.9, and use bi-cubic interpolation to prop¬ 
agate both the optical fiows and latent frames to the next 
pyramid level. 

Moreover, to reduce the number of unknowns in optical 
fiows, we only estimate u^^i+i and U 2 ^ 2 -i- We approxi¬ 
mate Ui^i -^2 using and Ui+i^i+ 2 - For example, it 

satisfies, U 2 ^ 2+2 = u^^i+i + U 2 +i^ 2 + 2 , as illustrated in 
Fig. 5, and we can easily apply this forn 7 ^ 1. 












The overall process of our algorithm is in Algorithm 1 . 
Further details on estimating the duty cycle and post¬ 
processing step that reduces artifacts are given below. 


Algorithm 1 Overview of the proposed method 

Input: Blurry frames B 
Output: Latent frames L and optical flows u 
1: Initialize duty cycle ri and optical flows u. (Sec. 4.1) 

2: Build image pyramid. 

3: Restore sharp video with flxed u. (Sec. 3.1) 

4: Estimate optical flows with flxed L. (Sec. 3.2) 

5: Detect occlusion and perform post-processing. (Sec 4.2) 
6: Propagate variables to the next pyramid level if exists. 

7: Repeat steps 3-6 from coarse to flne pyramid level. 


4.1. Duty Cycle Estimation 

In this study, we assume that the camera duty cycle is 
known for every frame. We can obtain the duty cyle from 
public SDK, when we use Kinect to capture RGB videos. 
However, when we conduct deblurring with conventional 
data sets, which do not provide exposure information, we 
apply the technique proposed in [7] to estimate the duty cy¬ 
cle. Contrary to the original method in [7], we use optical 
flows instead of homographies to obtain initially approxi¬ 
mated blur kernels. Therefore, we first estimate flow fields 
from blurry images with [26], which runs in near real-time. 
We then use them as initial flows and approximate the ker¬ 
nels to estimate the duty cycle. 

4.2. Occlusion Detection and Refinement 

Our piece-wise linear kernel naturally results in approx¬ 
imation error and it causes problems such as ringing arti¬ 
facts. Moreover, our data model in (4), and temporal coher¬ 
ence model in (5) are invalid at occluded regions. 

To reduce such artifacts from kernel errors and occlu¬ 
sions, we use spatio-temporal filtering as a post-processing: 

1 ^ 

(15) 

n=-N y 

where y denotes a pixel in the 3x3 neighboring patch at 
location (x -f u^^i+n) and ^ is the normalization factor 

(e.g. Z(x) = En=-ivEy^i,n(x,y))- Notably, we enable 
n = 0 in (15) for spatial filtering. Our occlusion-aware 
weight Wi^n is defined as follows: 

/ ^ t \ t ll^i(x) - -Pi+n(y)|lL 

w^i,n(x,y) = Oi,„(x,y) •exp( --), 

(16) 

where occlusion state y) G {0, 0.5,1} is determined 

using the method proposed in [15]. The 5x5 patch Pi(x) is 
centered at x in frame i. The similarity control parameter 
cFw is fixed as = 25/255. 


5. Experimental Results 

In what follows, we demonstrate the superiority of the 
proposed method. (For more results, see the supplementary 
video.) 

First, we compare our deblurring results with those of the 
state-of-the art exemplar based method [7] with the videos 
used in [ ]. As shown in Fig. 6, the captured scenes are 
dynamic and contain multiple moving objects. The method 
[ ] fails in restoring the moving objects, because the object 
motions are large and distinct from the backgrounds. By 
contrast, our results show better performances in deblurring 
moving objects and backgrounds. This exemplar-based ap¬ 
proach also fails in handling large blurs, as shown in Fig. 
7, as the initially estimated homographies in the largely 
blurred images are inaccurate. Moreover, this approach ren¬ 
ders excessively smooth results for mid-frequency textures 
such as trees, as the method is based on interpolation with¬ 
out spatial prior for latent frames. 

Next, we compare our method with the state-of-the-art 
segmentation-based approach [28]. In Fig. 8, the captured 
scene is a bi-layer and used in [28]. Although the bi-layer 
scene is a good example to verify the performance of the 
layered model, inaccurate segmentation near the boundaries 
causes serious artifacts in the restored frame. By contrast, 
our method does not depend on accurate segmentation and 
thus restores the boundaries much better than the layered 
model. 

In Fig. 9, we quantitatively compare the optical flow ac¬ 
curacies with [24] on synthetic blurry images. Although 
[24] proposed to handle blurry images in optical flow esti¬ 
mation, its assumption does not hold in motion boundaries, 
which are very important for deblurring. Therefore, their 
optical flow is inaccurate in the motion boundaries of mov¬ 
ing objects. However, our model enables abrupt changes of 
motions and thus performs better than the previous model. 

Moreover, we show the deblurring results with and with¬ 
out using the temporal coherence term in (5), and verify 
that our temporal coherence model significantly reduces 
ringinig artifacts near the edges in Fig. 10. 

Other deblurring results from numerous real videos are 
shown in Fig. 11. Notably, our model successfully restores 
the face which has highly non-uniform blurs because the 
person moves rotationally (Fig. 11(e)). 

6. Conclusions 

In this study, we introduced a novel method that removes 
general blurs in dynamic scenes, which conventional meth¬ 
ods fail to do. By estimating a pixel-wise kernel using op¬ 
tical flows, we handled general blurs. Thus, we proposed 
a new energy model that estimates optical flows and latent 
frames, jointly. 

We also provided a framework and efficient solvers to 








Figure 6: Left to right: Blurry frames of dynamic scenes, deblurring results of [7], and our results. 



Figure 7: Left to right: Blurry frame, deblurring result of [7], and ours. 



Figure 8: Comparison with segmentation-based approach. Left to right: Blurry frame, result of [28], and ours. 
























































Figure 9: EPE denotes average end point error, (a) Color 
coded ground truth optical flow between blurry images, (b) 
Optical flow estimation result of [24] . (c) Our result. 



Eigure 10: (a) Real blurry frame of a video, (b) Our de¬ 
blurring result without using ^temporal • (c) Our deblurring 
result with Etemporal^ 


minimize the energy function and achieved signiflcant im¬ 
provements in removing general blurs in dynamic scenes. 



Eigure 11 : Left to right: Numerous real blurry frames and 
our deblurring results, (a)-(b) Data sets used in [7]. (c)-(e) 
Captured RGB data sets using kinect. 
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